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ABSTRACT 


**  In  models  of  reliability  growth  in  stages,  it  is  usual  to  assume  that 
system  parameters  improve  monotonically  from  stage  to  stage,  following 
some  postulated  law  of  growth.  This  paper  explores  a  Bayesian  model 
where  such  improvement  only  occurs  on  the  average,  e.g. ,  a  case  when 
the  parameters  are  assumed  to  be  stochastically  ordered.  It  is  shown 
that  the  problem  can  be  recast  into  a  hierarchical  form  in  which  there 
are  strictly-ordered  hyperparameters  which  index  the  admissible  family 
of  ordered  distributions  for  the  parameters;  the  modelling  problem  is 
then  to  describe  an  appropriate  law  of  motion  over  the  hyperparameters 


STOCHASTICALLY-ORDERED  PARAMETERS  IN  BAYESIAN  PREDICTION 


by 

William  S.  Jewell 


0.  INTRODUCTION 

Models  of  reliability  growth  have  received  increasing  attention  in 
recent  literature.  One  approach  is  to  assume  an  underlying  continuous 
improvement  in  system  parameters;  we  call  this  the  learning-curve  approach 
(Jewell  [2]).  A  second  model,  which  may  be  more  appropriate  under  certain 
circumstances,  is  growth  by  stages,  in  which  the  parameters  increase  (or 
decrease)  monotonically  at  certain  fixed  points  in  time,  when  design 
improvements  are  made.  In  both  of  these  models,  an  important  problem 
is  the  prediction  of  the  ultimate  performance  of  the  system  after  all 
reliability  growth  has  occured  or  all  the  design  modifications  have  been 
effected;  in  practical  procurement  problems,  it  is  desirable  to  make 
these  predictions  using  as  little  test  data  as  possible. 

This  paper  explores  an  extension  of  the  growth  by  stages  model 
suggested  by  N.  Singpurwalla  [3]  in  which  the  parameters  improve  only 
stochastically ,  rather  than  absolutely. 


2 


..  DETERMINISTICALLY-ORDERED  PARAMETERS 


Consider  the  usual  set-up,  in  which  random  observables  x^,X2»  . ..,  x^ 
in  periods  1,2,  . ..,  t  are  governed  by  random  parameters  according  to 
some  likelihood  density  pCx^^*  ...»  xfc  |  '  Given  a 

prior  parameter  density  pC^^*  . ..,  0t)  ,  and  observations  x  * 

(x^,X2*  . ..,  x^)  ,  a  straightforward  application  of  Bayes r  law  calculates 
the  posterior  parameter  density  •••>  0t  I  x)  • 

In  most  applications,  0t  is  assumed  sufficient  for  x^  ,  so  that 
the  likelihood  takes  the  simpler  form: 


(1*1)  p(x^  ,X2 ,  •  ••*  xt  j  *  *  *  *  ®t^  *  ^  P-^(xi  I  5 


we  retain  explicit  dependence  upon  the  time  period  primarily  to  permit 
stages  of  different  duration,  but  not  usually  different  failure  laws  in 
each  interval. 

In  the  usual  growth-by-stages  model,  the  improvement  is  expressed 
in  terms  of  a  deterministic  ordering  of  the  parameters,  say: 


61  —  02  —  *  *  •  —  » 


and  this  is  handled  in  an  obvious  way  through  the  definition  of  the 
parameter  prior  and  its  support.  A  usual  simplification  is  to  fix 
some  ,  and  then  suppose  that  0 depends  only  upon  9^  ,  and  not 
upon  other  parameters  or  the  index  i  ,  so  that: 


(1.3)  p(e1,e2,  ....  0t)  -  n  p(91  |  e^i)  (eQ  1  i  e2  L  9t) 
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Typically  a  prototype  parameter  density  g(u)  ,  0  <  u  <  1  ,  is  selected 
for  modelling  convenience,  giving  a  common  form  to  the  "shrinkage" 


2.  STOCHASTICALLY-ORDERED  PARAMETERS 


N.  Singpurwalla  [3]  has  suggested  that  (1.2)  might  reasonably 
be  replaced  by  the  weaker  hypothesis  that 


(2.1) 


9.  >*  0O  >- 
1  ~  2  ~ 


This  might  mean,  for  example,  that  while  the  failure  rate  tended 
to  decrease  from  stage  to  stage,  such  growth  was  by  no  means  certain 
in  a  given  realization.  We  now  modify  the  basic  model  to  handle  this 
assumption. 

(2.1)  implies  that  the  parameters  are  selected  from  some  family 
F  of  priors  which  are  stochastically  ordered  over  the  different  stages. 

Let  a  be  a  scalar  hyperparameter  which  indexes  the  members  of  this 
family.  Then  we  can  associate  ct1,a2*  •••*  <*t  vith  stages  1,2,  t  , 

and  arrange  that 

(2.2)  >_  a2  ^  i  at 

(or  an  increasing  version)  selects  an  appropriately  ordered  distribution 
for  each  parameter.  Specifically,  F*  is  a  family  of  conditional  prior 
densities  p(0  |  a)  ,  with  complementary  distribution  PC(0  |  a)  ,  such 
that,  for  every  value  of  0  and  every  i  <  j  (i  ,  j  =  1 ,2 ,  . . . ) 

(2.3)  Pc(0  I  op  >  Pc(e  I  o  )  ; 

this  can  be  visualized  as  in  Figure  1.  can  be  an  actual  hyperparameter 

of  the  (conditional)  prior,  or  it  can  be  an  abstract  index  that  selects 
one  of  an  admissible  member  of  ordered  priors.  The  key  point  is  to  arrange 
things  so  that  (2,2)  guarantees  (2,1). 
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FIGURE  1 

ADMISSIBLE  FAMILY,  F  ,  OF  STOCHASTICALLY 
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ORDERED  CONDITIONAL  PRIORS  ON  0 
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With  (2.1)  guaranteed,  what  can  we  say  about  the  selection  of 
the  hyperparameters?  In  a  Bayesian  model  we  must  now  express  our 
prior  beliefs  and  previous  experience  about  what  values  of  a2’a2’  ***’  at 
might  occur  during  the  different  stages.  In  other  words,  we  consider  the 
hyperparameters  as  random  variables  with  a  given  huperprior  density 
a^)  over  the  family  F  satisfying  (2.2). 

Effectively,  the  stochastic  ordering  has  introduced  a  hierarchical 
Bayesian  formulation  (see,  e.g.,  Jewell,  [1]),  in  which  we  again  have 
deterministic  ordering,  this  time  in  the  hyperparameters.  Note  also 
that  this  approach  avoids  the  "pseudo-Bayesian"  formulation  described 
by  Singpurwalla  [3]. 

For  convenience  in  the  sequel,  we  assume  a  Markovian  hyperprior 
similar  to  (1.3),  so  that,  given  ,  and  a  family  of  (hyperprior) 
transition  densities  p(a^  |  ct^  )  ,  we  have: 

t 

(2.4)  p(a1,o2,  . .  * ,  at)  «  II  p(ai  |  a^)  (aQ  L  ai  L  a2  1  ■  •  •  1  at)  • 

(a^  could,  of  course,  have  an  initial  density  of  its  own.  But,  for 
simplicity,  we  omit  further  explicit  mention  of  a^) . 

In  summary,  we  have  the  following  model: 

(a)  Prior  beliefs  or  experience  specify: 

(i)  Uq  and  the  hyperprior  transition  densities  p(ai  |  ) 

with  deterministic  ordering  (2.2); 

(ii)  the  family  b  of  admissible  conditional  priors,  so  that, 
given  ,  9^  has  complementary  distribution  PC(0^  |  aj 
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and  stochastic  ordering  (2.1); 


(b)  A  known  failure  law  governs  the  different  observational 

likelihoods  p.(x.  I  0.)  . 

1  1  i 

Note  that  a.  is  sufficient  for  0.  which  is,  in  turn  sufficient 
i  i 

for  ,  so  that,  given  a  *  (a^,  . a^)  ,  the  problem  of  predicting 

0_  *  0 1)  and/or  x  =  •  ••*  xt)  decomposes  into 

several  separate  problems.  However,  the  usual  situation  is  that  we 
are  given  the  observations  x  ,  and  want  to  update  the  distributions 
of  the  unobserved  0_  and  a  ,  or  to  prediet  future  observations 

•••  •  Because  of  the  Markovian  dependence  of  the  cu  , 
both  of  these  problems  require  extensive  applications  of  Bayes 1  law. 


3.  MARGINAL  DENSITIES 

As  a  preliminary,  we  indicate  how  to  calculate  the  various  marginal 
densities.  For  the  parameters,  the  joint  marginal  is: 


(3.1)  p(e1,e2. 


V 


t 

n  P (e 
i-l 


ai)p(ai  )  ai_1)dai  . 


Perhaps  the  easiest  way  to  calculate  is  by  defining: 


(3-2)  p(01;a2)  =  pC0;L  |  a^pCc^  |  a^ptc^  |  a^dc^  , 


and  then  proceed  iteratively  to  find: 


p(91,02,  ....  0i  ;  ai+1)  =  f  p(ei  j  ai)p(“i+1  I 

(3.3) 

*  P(01,02’  ®i-l  >  ai)dai 

for  i  *  2,3,  ...  .  At  any  point  t  where  the  marginal  density  is 
desired, 


(3.4)  p(01,92. 


p(0 


V  2* 


t-1 


a  )da 
t  t 


It  is  clear  that  the  con&w*wate  x  Sc  hast  rc 

which  is  not  Markovian  (as  in  (1.3)).  One  can  show,  however,  that  the 
nature  of  F  ,  expressed  in  (2.3),  implies  ordered  marginals: 


(3.5)  Pr  {Q±  >_  0}  >_  Pr  (9^  1  9}  ,  (i  <  j)  (V  9) 


as  expected. 
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For  the  observables  {x^}  ,  the  intermediate  nature  of  the  {8.} 

t  1 

suggests  we  define  trbypasstr  conditional  densities  from  the  likelihood 
and  prior,  viz.. 


(3.6) 


Pi(xi 


a.) 

l 


ei)p(0i  I  a.)dei 


for  i  =*  1,2,  ...  .  The  joint  marginal  of  the  observations  is  then 
similar  to  (3.1): 


(3.7)  p(x1,x2, 


which  can  also  be  "bootstrapped”  as  in 
(3.1)  or  (3.4)  were  first  calculated. 


pi(xi 


(3.2), 


I  “i)p(ai  |  ai_1)dai  , 
(3.3).  Alternately,  if 


(3.8)  p(x1,x2, 


V  - 


//...  J p(9l,92, 


t 

et)  n  P.(x.  |  9.)dei 

i=l  x 


¥ 
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4  .  UPDATING  HYPERPARAMETERS 

We  turn  now  to  the  calculation  of  the  inverse  probabilities. 
By  using  the  bypass  conditional  densities  of  (3.6),  we  find  that 
(given  aQ): 


(4.1)  p(a1,  ....  at 


^1  *  x2 1 


V 


t 

Pi(xi  I  ai)  *  p(ai  I  ai-i) 


so  that  the  update  formula  for  the  hyperparameters  (i.e.,  the  "hyper¬ 
posterior-  to-  the-data"  density)  is: 


(4.2) 


p(a 


lf 


x) 


t 

n 

i=l 


pi(xi 


ai) 


P(cti 


Vl* 


t  1  - 


p(x) 


p(x)  is  calculated  in  (3.7),  but  in  most  practical  calculations,  it  is 
treated  simply  as  a  normalization  factor.  Note  that  "later"  observables 
provide  information  about  "earlier"  hyperparameters  because  of  the 
Markovian  dependence. 
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5.  PREDICTING  HYPERPARAMETERS 

The  prediction  of  from  x  is  most  easily  "bootstrapped"  in  a 

manner  similar  to  (3.2),  (3.3).  We  define: 

(5.1)  p(x1;a2)  I  a1)p(a2  |  ct^p^  |  a^do^  . 

and  then  proceed  Iteratively  for  i  *  2,3,  ...  . 


(5.2) 


p(xL,x2 . x±  ;  ai+1)  -  j  Pt(xt  |  at)p(“i+1  I  ^ 


p(x^,X2>  . x±-i  >  ai^ai  . 


At 


any  stage  t  ,  the  prediction  of  is  a  simple  normalization,  viz. 


(5.3) 


p(xl>*2 . xt  ;  at-H} 

P(x) 


The  iterative  calculations  (5.1),  (5.2)  provide  also  a  simple  formula  for 
updating  the  most  recent  hyperparameter: 


(5.4)  p(at  |  x) 


Pt(xt  I  at)  *  P(x1»x2»  •••»  xt_i  ;  at> 

P(x) 


1 


J 
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6.  UPDATING  PARAMETERS 


It  does  not  seem  easy  to  provide  updating  for  the  parameters  alone, 
since  p(9^,  9t  I  — )  requires  a  complex  integration  over  the 

Markovian  transition  probabilities.  If  all  the  marginals  (3.2),  (3.3) 
have  been  calculated  to  give  (3.4),  then,  of  course. 


(6.1)  p(0lt02 . 0t  |  x)  ■ 


n  pt(xt  |  0t)p(01>02>  0t> 


For  the  latest  parameter  only,  the  "bootstrap1*  in  (5.2)  can  be  used  with 


(6.2)  p(0t  |  x) 


J*Pt(xt  |  0t)p(0t  |  at)p(x1,x2,  ....  x(._1  ;  at)dat 


All  data  must  be  used  to  update  0  .  However,  if  we  also  know  , 
then  only  the  latest  observation  is  sufficient,  as  can  be  seen  from: 


Pt(*t  I  9  )p(0t  I  at)  . 

(6.3)  p(0t  |  «e  .  x)  - - p(Xt  |  atj -  =  P<et  I  “t  *  xt>  • 


7.  OTHER  PREDICTIONS 


In  fact,  the  hyperparameter  forecast  density  (5.3)  is  the  key  to 
all  other  predictions,  since,  given  x  ,  only  at+^  is  sufficient 
for  the  next  period,  since: 


(7.1) 


p(xt+l  ’  9t+l  ’  at+l  I  5) 


pt+l(xt+l  I  Wp(9t+1  I  at+l)p(at+l  l 


marginal  versions  follow  by  integration. 

For  stages  in  the  future,  this  generalizes  to 


p(xt+u  *  et+u  *  at+u  I 

(7-2)  =  pt+u(xt+u  I  9t+u)p(et+u  I  “t+u) 

•Jp(U“1)(at+u  I  °t+l)p<°t+l  1  *)dat+l 

for  u  =  2,3,  ...  ,  where  p^(*  |  •)  is  the  usual  n-step  Markovian 
transition  probability . 
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8.  MODEL  VARIATIONS 

The  model  described  in  (2.5)  is  actually  quite  general.  First 
of  all,  it  includes  the  deterministically-ordered  parameter  model  of 
Section  1  by  making  the  conditional  prior,  p(0  |  a)  ,  a  degenerate 
density — i.e.,  for  all  t  ;  the  transition  densities  of  (2.4) 

are  then  those  of  (1.3). 

If,  on  the  other  hand,  we  assume  a  time-invariant  hyperparameter 

a  -  01q  =  *  ...  =  at  ,  with  initial  hyperprior  density  tt(oO  ,  then 

the  model  reduces  to  a  hypothesis  mixing  formulation ,  in  which  data 

from  successive  trials  is  helping  us  to  select  one  of  a  (usually 

finite)  number  of  priors  p(0^  |  a)  .  The  parameter  0^  is  usually 

also  stationary,  and  the  calculations  reduce  to  updating  posterior  to 

data  mixing  probabilities  7r(a  |  x)  . 

The  strict  ordering  (2.2)  only  applies  if  there  is  a  scalar 

hyperparameter  in  the  prior,  (or  if  it  is  an  abstract  index  over  F) . 

Figure  2  shows  the  situation  if  the  hyperparameter  is  a  two-vector, 

Qk  *  (a-  ,a9  )  .  From  the  form  of  the  conditional  prior  p(0  |  a)  , 

c  i ,  t  z ,  t 

we  can  in  principle  find  the  allowed  region ,  R(o_t_^)  ,  for  at  such 

that  0t  £  0 ,  as  shown.  This  region  depends  upon  ^  and 

includes  it;  in  other  words,  the  stochastic  ordering  of  the  parameters 

induces  a  partial  ordering  over  the  vector-valued  hyperparameters. 

The  transition  probabilities  p(o^  |  a^_^)  then  provide  the  appropriate 

measure  over  R(a  _)  . 

— t-l 

We  also  see,  upon  reflection,  that  our  formulation  can  produce 
many  other  forms  of  parameter  relationships  than  (1.2)  or  (2.1), 
through  appropriate  choices  of  R  .  For  instance,  if  p(0  |  a)  is  a 


gamma  density,  an  ordering  of  the  shape  parameter,  keeping  the  mean 
fixed,  will  give  a  family  V  of  priors  with  ordered  coefficients 
of  variation,  ranging  from  exponential  to  degenerate  densities. 
Other  variations  are  left  to  the  reader* s  imagination. 


9.  APPLICATION 


In  spite  of  its  generality,  this  approach  has  serious  practical 
drawbacks.  The  first  is  that  it  is  difficult  to  construct  analytic 
models  in  which  the  hierarchical  computations  can  be  carried  out 
explicitly;  even  with  a  natural  conjugate  prior  on  0  ,  there 
remains  the  problem  of  specifying  the  hyperparameter  density  in  a 
convenient  form.  ([4]  illustrates  some  of  these  problems  using  the 
shrinkage  (1.4).) 

Secondly,  digital  computations  require  an  additional  dimension 
of  the  discretized  space,  so  that  it  is  difficult  to  approximate  the 
usual  continuous  densities.  Limited  computational  experience  with 
small,  rectangular  priors  and  hyperpriors  indicated  that  the  growth 
under  stochastic  ordering  behaves  as  expected,  with  an  occasional 
reversal  of  growth,  but  otherwise  was  not  illuminating. 

Finally,  of  course,  we  should  question  the  hypothesis  of  stochastic 
ordering  itself.  A  decision-maker  might  believe  that  improvement 
occurs  "on  the  average",  for  instance,  but  would  prefer  to  model  the 
anti-growth  situations  in  a  completely  different  way,  with,  say,  (2.1) 
replaced  by  an  ordering  of  the  means  only.  We  doubt  seriously  that 
any  real  data  could  discriminate  against  two  such  competing  models. 
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